Density-based Clustering

Density-based methods group data points together based on density instead of distance.

How does it work

define two hyperparameters:
- a distance parameter $ϵ$ = the maximum distance
- a quantity parameter $n$ = the minimum number of examples to put in a cluster
Pick an example $x$ from your dataset at random and assign it to cluster 1, then count how many examples have the distance from $x$ less than or equal to $ϵ$ .
- If this quantity is greater than or equal to $n$ , then put all these $ϵ$ -neighbors to the same cluster 1.
- Examine each member of cluster 1 and find their respective $ϵ$ -neighbors. If some member of cluster 1 has $n$ or more $ϵ$ -neighbors, expand cluster 1 by adding those $ϵ$ -neighbors to the cluster.
Continue expanding cluster 1 until there are no more examples to put in it.
- Pick from the dataset another example not belonging to any cluster and put it to cluster 2. You continue like this until all examples either belong to some cluster or are marked as outliers. An outlier is an example whose ϵ\epsilon-neighborhood contains less than nn examples.”

It uses a parameter called the minimum cluster size (MinClusterSize) in addition to ε and MinPts.

DBSCAN (Density-Based Spatial Clustering of Applications with Noise)
HDBSCAN (Hierarchical Density-Based Spatial Clustering of Applications with Noise)
- is just an extension of DBSCAN that addresses some of its limitations and builds upon the concept of density-based clustering by introducing a hierarchical approach.
OPTICS (Ordering Points To Identify the Clustering Structure)